Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 39
Filtrar
1.
Psicothema (Oviedo) ; 35(1): 50-57, 2023.
Artigo em Inglês | IBECS | ID: ibc-215062

RESUMO

Background: The emergence of digital technology in the field of psychological and educational measurement and assessment broadens the traditional concept of pencil and paper tests. New assessment models built on the proliferation of smartphones, social networks and software developments are opening up new horizons in the field. Method: This study is divided into four sections, each discussing the benefits and limitations of a specific type of technology-based assessment: ambulatory assessment, social networks, gamification and forced-choice testing. Results: The latest developments are clearly relevant in the field of psychological and educational measurement and assessment. Among other benefits, they bring greater ecological validity to the assessment process and eliminate the bias associated with retrospective assessment. Conclusions: Some of these new approaches point to a multidisciplinary scenario with a tradition which has yet to be created. Psychometrics must secure a place in this new world by contributing sound expertise in the measurement of psychological variables. The challenges and debates facing the field of psychology as it incorporates these new approaches are also discussed.(AU)


Antecedentes: La irrupción de la tecnología digital en las áreas de medición y evaluación psicológica y educativa expande el concepto clásico de test de lápiz y papel. Los modelos de evaluación construidos sobre la ubicuidad de los smartphones, las redes sociales o el desarrollo del software abren nuevas posibilidades para la evaluación. Método: El estudio se organiza en cuatro partes en cada una de las cuales se discuten las ventajas y limitaciones de una aplicación de la tecnología a la evaluación: la evaluación ambulatoria, las redes sociales, la gamificación y las pruebas de elección forzosa. Resultados: Los nuevos desarrollos resultan claramente relevantes en el ámbito de la medición y la evaluación psicológica y educativa. Entre otras ventajas, aportan una mayor validez ecológica al proceso evaluativo y eliminan el sesgo relacionado con la evaluación retrospectiva. Conclusiones: Algunas de estas nuevas aproximaciones llevan a un escenario multidisciplinar con una tradición aún por construir. La psicometría está obligada a integrarse en este nuevo espacio aportando una sólida experiencia en la medición de variables psicológicas. Se muestran los temas de debate y retos que ha de abordar el buen quehacer de la psicología en la incorporación de estas nuevas aproximaciones.(AU)


Assuntos
Humanos , Tecnologia da Informação/estatística & dados numéricos , Tecnologia da Informação/tendências , Avaliação Educacional , Tecnologia , Testes Psicológicos , Rede Social , Psicometria , Psicologia
2.
Front Psychol ; 12: 685326, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34149573

RESUMO

The item wording (or keying) effect consists of logically inconsistent answers to positively and negatively worded items that tap into similar (but polarly opposite) content. Previous research has shown that this effect can be successfully modeled through the random intercept item factor analysis (RIIFA) model, as evidenced by the improvements in the model fit in comparison to models that only contain substantive factors. However, little is known regarding the capability of this model in recovering the uncontaminated person scores. To address this issue, the study analyzes the performance of the RIIFA approach across three types of wording effects proposed in the literature: carelessness, item verification difficulty, and acquiescence. In the context of unidimensional substantive models, four independent variables were manipulated, using Monte Carlo methods: type of wording effect, amount of wording effect, sample size, and test length. The results corroborated previous findings by showing that the RIIFA models were consistently able to account for the variance in the data, attaining an excellent fit regardless of the amount of bias. Conversely, the models without the RIIFA factor produced increasingly a poorer fit with greater amounts of wording effects. Surprisingly, however, the RIIFA models were not able to better estimate the uncontaminated person scores for any type of wording effect in comparison to the substantive unidimensional models. The simulation results were then corroborated with an empirical dataset, examining the relationship between learning strategies and personality with grade point average in undergraduate studies. The apparently paradoxical findings regarding the model fit and the recovery of the person scores are explained, considering the properties of the factor models examined.

3.
Front Psychol ; 12: 614470, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-33658962

RESUMO

Cognitive diagnosis models (CDMs) allow classifying respondents into a set of discrete attribute profiles. The internal structure of the test is determined in a Q-matrix, whose correct specification is necessary to achieve an accurate attribute profile classification. Several empirical Q-matrix estimation and validation methods have been proposed with the aim of providing well-specified Q-matrices. However, these methods require the number of attributes to be set in advance. No systematic studies about CDMs dimensionality assessment have been conducted, which contrasts with the vast existing literature for the factor analysis framework. To address this gap, the present study evaluates the performance of several dimensionality assessment methods from the factor analysis literature in determining the number of attributes in the context of CDMs. The explored methods were parallel analysis, minimum average partial, very simple structure, DETECT, empirical Kaiser criterion, exploratory graph analysis, and a machine learning factor forest model. Additionally, a model comparison approach was considered, which consists in comparing the model-fit of empirically estimated Q-matrices. The performance of these methods was assessed by means of a comprehensive simulation study that included different generating number of attributes, item qualities, sample sizes, ratios of the number of items to attribute, correlations among the attributes, attributes thresholds, and generating CDM. Results showed that parallel analysis (with Pearson correlations and mean eigenvalue criterion), factor forest model, and model comparison (with AIC) are suitable alternatives to determine the number of attributes in CDM applications, with an overall percentage of correct estimates above 76% of the conditions. The accuracy increased to 97% when these three methods agreed on the number of attributes. In short, the present study supports the use of three methods in assessing the dimensionality of CDMs. This will allow to test the assumption of correct dimensionality present in the Q-matrix estimation and validation methods, as well as to gather evidence of validity to support the use of the scores obtained with these models. The findings of this study are illustrated using real data from an intelligence test to provide guidelines for assessing the dimensionality of CDM data in applied settings.

4.
Appl Psychol Meas ; 45(2): 112-129, 2021 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-33627917

RESUMO

Decisions on how to calibrate an item bank might have major implications in the subsequent performance of the adaptive algorithms. One of these decisions is model selection, which can become problematic in the context of cognitive diagnosis computerized adaptive testing, given the wide range of models available. This article aims to determine whether model selection indices can be used to improve the performance of adaptive tests. Three factors were considered in a simulation study, that is, calibration sample size, Q-matrix complexity, and item bank length. Results based on the true item parameters, and general and single reduced model estimates were compared to those of the combination of appropriate models. The results indicate that fitting a single reduced model or a general model will not generally provide optimal results. Results based on the combination of models selected by the fit index were always closer to those obtained with the true item parameters. The implications for practical settings include an improvement in terms of classification accuracy and, consequently, testing time, and a more balanced use of the item bank. An R package was developed, named cdcatR, to facilitate adaptive applications in this context.

5.
Br J Math Stat Psychol ; 74 Suppl 1: 110-130, 2021 07.
Artigo em Inglês | MEDLINE | ID: mdl-33231301

RESUMO

The Q-matrix identifies the subset of attributes measured by each item in the cognitive diagnosis modelling framework. Usually constructed by domain experts, the Q-matrix might contain some misspecifications, disrupting classification accuracy. Empirical Q-matrix validation methods such as the general discrimination index (GDI) and Wald have shown promising results in addressing this problem. However, a cut-off point is used in both methods, which might be suboptimal. To address this limitation, the Hull method is proposed and evaluated in the present study. This method aims to find the optimal balance between fit and parsimony, and it is flexible enough to be used either with a measure of item discrimination (the proportion of variance accounted for, PVAF) or a coefficient of determination (pseudo-R2 ). Results from a simulation study showed that the Hull method consistently showed the best performance and shortest computation time, especially when used with the PVAF. The Wald method also performed very well overall, while the GDI method obtained poor results when the number of attributes was high. The absence of a cut-off point provides greater flexibility to the Hull method, and it places it as a comprehensive solution to the Q-matrix specification problem in applied settings. This proposal is illustrated using real data.


Assuntos
Modelos Estatísticos , Projetos de Pesquisa , Simulação por Computador , Psicometria
6.
Psicothema (Oviedo) ; 32(4): 549-558, nov. 2020. tab, graf
Artigo em Inglês | IBECS | ID: ibc-201327

RESUMO

BACKGROUND: Unproctored Internet Tests (UIT) are vulnerable to cheating attempts by candidates to obtain higher scores. To prevent this, subsequent procedures such as a verification test (VT) is carried out. This study compares five statistics used to detect cheating in Computerized Adaptive Tests (CATs): Guo and Drasgow's Z-test, the Adaptive Measure of Change (AMC), Likelihood Ratio Test (LRT), Score Test, and Modified Signed Likelihood Ratio Test (MSLRT). METHOD: We simulated data from honest and cheating candidates to the UIT and the VT. Honest candidates responded to the UIT and the VT with their real ability level, while cheating candidates responded only to the VT, and different levels of cheating were simulated. We applied hypothesis tests, and obtained type I error and power rates. RESULTS: Although we found differences in type I error rates between some of the procedures, all procedures reported quite accurate results with the exception of the Score Test. The power rates obtained point to MSLRT's superiority in detecting cheating. CONCLUSIONS: We consider the MSLRT to be the best test, as it has the highest power rate and a suitable type I error rate


ANTECEDENTES: las pruebas de selección en línea sin vigilancia (UIT) son vulnerables a intentos de falseamiento para obtener puntuaciones superiores. Por ello, en ocasiones se utilizan procedimientos de detección, como aplicar posteriormente un test de verificación (VT). El objetivo del estudio es comparar cinco contrastes estadísticos para la detección del falseamiento en Test Adaptativos Informatizados: Z-test de Guo y Drasgow, Medida de Cambio Adaptativa (AMC), Test de Razón de Verosimilitudes (LRT), Score Test y Modified Signed Likelihood Ratio Test(MSLRT). MÉTODO: se simularon respuestas de participantes honestos y falseadores al UIT y al VT. Para los participantes honestos se simulaban en ambos en función de su nivel de rasgo real; para los falseadores, solo en el VT, y en el UIT se simulaban distintos grados de falseamiento. Después, se obtenían las tasas de error tipo I y potencia. RESULTADOS: Se encontraron diferencias en las tasas de error tipo I entre algunos procedimientos, pero todos menos el Score Test se ajustaron al valor nominal. La potencia obtenida era significativamente superior con el MSLRT. CONCLUSIONES: consideramos que MSLRT es la mejor alternativa, ya que tiene mejor potencia y una tasa de error tipo I ajustada


Assuntos
Humanos , Detecção de Mentiras , Enganação , Internet , Avaliação Educacional/estatística & dados numéricos , Validação de Programas de Computador , Testes Psicológicos/normas , Testes Psicológicos/estatística & dados numéricos , Avaliação Educacional/normas , Análise de Variância , Curva ROC
7.
Psicothema (Oviedo) ; 32(4): 607-614, nov. 2020. tab, graf, ilus
Artigo em Inglês | IBECS | ID: ibc-201334

RESUMO

BACKGROUND: Due to its flexibility and statistical properties, bi-factor Exploratory Structural Equation Modeling (bi-factor ESEM) has become an often-recommended tool in psychometrics. Unfortunately, most recent methods for approximating these structures, such as the SLiD algorithm, are not available in the leading software for performing ESEM (i.e., Mplus). To resolve this issue, we present a novel, user-friendly Shiny application for integrating the SLiD algorithm in bi-factor ESEM estimation in Mplus. Thus, a two-stage framework for conducting SLiD-based bi-factor ESEM in Mplus was developed. METHOD: This approach was presented in a step-by-step guide for applied researchers, showing the utility of the developed SLiDApp application. Using data from the Open-Source Psychometrics Project (N = 2495), we conducted a bi-factor ESEM exploration of the Generic Conspiracist Beliefs Scale. We studied whether bi-factor modelling was appropriate and if both general and group factors were related to each personality trait. RESULTS: The application of the SLiD algorithm provided unique information regarding this factor structure and its ESEM structural parameters. CONCLUSIONS: The results illustrated the usefulness and validity of SLiD-based bi-factor ESEM, and how the proposed Shiny app could make it eaiser for applied researchers to use these methods


ANTECEDENTES: los modelos bi-factoriales de ecuaciones estructurales exploratorias (bi-factor ESEM) se han convertido en una herramienta clave en psicometría. Desafortunadamente, las últimas alternativas para su estimación no se encuentran disponibles en el software principal usado para su aproximación (i.e., Mplus). Para solucionar este problema se presenta una aplicación Shiny (SLiDApp) que permite integrar los resultados del algoritmo SLiD en un modelo bi-factor ESEM estimado en Mplus. Para ello, se diseñó una estrategia de dos pasos para aproximar estos modelos. MÉTODO: este enfoque se ilustró a través de una guía paso por paso de cómo usar la aplicación diseñada y el análisis de un modelo bi-factor ESEM basado en SLiD de la Escala de Creencias Conspirativas Genéricas usando datos del Open-Source Psychometrics Project (N = 2495). Se analizó la relación de los factores generales y de grupo con los cinco factores de personalidad. RESULTADOS: los resultados mostraron cómo el algoritmo SLiD proveía de información única acerca de la estructura factorial y los parámetros estructurales. CONCLUSIONES: este estudio demostró la utilidad tanto de los modelos bi-factoriales ESEM basados en SLiD cómo de la app propuesta. Se espera así que esta aplicación facilite el uso de dichos métodos por parte de investigadores aplicados


Assuntos
Humanos , Escalas de Graduação Psiquiátrica Breve/estatística & dados numéricos , Análise Fatorial , Modelos Psicológicos , Modelos Teóricos , Interpretação Estatística de Dados , Análise de Classes Latentes , Algoritmos , Apoio Social , Valores Sociais , Escalas de Graduação Psiquiátrica Breve/normas , Inventário de Personalidade/estatística & dados numéricos
8.
Appl Psychol Meas ; 44(6): 431-446, 2020 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-32788815

RESUMO

In the context of cognitive diagnosis models (CDMs), a Q-matrix reflects the correspondence between attributes and items. The Q-matrix construction process is typically subjective in nature, which may lead to misspecifications. All this can negatively affect the attribute classification accuracy. In response, several methods of empirical Q-matrix validation have been developed. The general discrimination index (GDI) method has some relevant advantages such as the possibility of being applied to several CDMs. However, the estimation of the GDI relies on the estimation of the latent group sizes and success probabilities, which is made with the original (possibly misspecified) Q-matrix. This can be a problem, especially in those situations in which there is a great uncertainty about the Q-matrix specification. To address this, the present study investigates the iterative application of the GDI method, where only one item is modified at each step of the iterative procedure, and the required cutoff is updated considering the new parameter estimates. A simulation study was conducted to test the performance of the new procedure. Results showed that the performance of the GDI method improved when the application was iterative at the item level and an appropriate cutoff point was used. This was most notable when the original Q-matrix misspecification rate was high, where the proposed procedure performed better 96.5% of the times. The results are illustrated using Tatsuoka's fraction-subtraction data set.

9.
PLoS One ; 15(1): e0227196, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-31923227

RESUMO

Currently, there are two predominant approaches in adaptive testing. One, referred to as cognitive diagnosis computerized adaptive testing (CD-CAT), is based on cognitive diagnosis models, and the other, the traditional CAT, is based on item response theory. The present study evaluates the performance of two item selection rules (ISRs) originally developed in the CD-CAT framework, the double Kullback-Leibler information (DKL) and the generalized deterministic inputs, noisy "and" gate model discrimination index (GDI), in the context of traditional CAT. The accuracy and test security associated with these two ISRs are compared to those of the point Fisher information and weighted KL using a simulation study. The impact of the trait level estimation method is also investigated. The results show that the new ISRs, particularly DKL, could be used to improve the accuracy of CAT. Better accuracy for DKL is achieved at the expense of higher item overlap rate. Differences among the item selection rules become smaller as the test gets longer. The two CD-CAT ISRs select different types of items: items with the highest possible a parameter with DKL, and items with the lowest possible c parameter with GDI. Regarding the trait level estimator, expected a posteriori method is generally better in the first stages of the CAT, and converges with the maximum likelihood method when a medium to large number of items are involved. The use of DKL can be recommended in low-stakes settings where test security is less of a concern.


Assuntos
Cognição , Avaliação Educacional/métodos , Psicometria/métodos , Algoritmos , Teorema de Bayes , Viés , Simulação por Computador , Computadores , Confiabilidade dos Dados , Humanos
10.
Educ Psychol Meas ; 79(4): 727-753, 2019 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-32655181

RESUMO

Cognitive diagnosis models (CDMs) are latent class multidimensional statistical models that help classify people accurately by using a set of discrete latent variables, commonly referred to as attributes. These models require a Q-matrix that indicates the attributes involved in each item. A potential problem is that the Q-matrix construction process, typically performed by domain experts, is subjective in nature. This might lead to the existence of Q-matrix misspecifications that can lead to inaccurate classifications. For this reason, several empirical Q-matrix validation methods have been developed in the recent years. de la Torre and Chiu proposed one of the most popular methods, based on a discrimination index. However, some questions related to the usefulness of the method with empirical data remained open due the restricted number of conditions examined, and the use of a unique cutoff point (EPS) regardless of the data conditions. This article includes two simulation studies to test this validation method under a wider range of conditions, with the purpose of providing it with a higher generalization, and to empirically determine the most suitable EPS considering the data conditions. Results show a good overall performance of the method, the relevance of the different studied factors, and that using a single indiscriminate EPS is not acceptable. Specific guidelines for selecting an appropriate EPS are provided in the discussion.

11.
Span J Psychol ; 21: E62, 2018 Dec 03.
Artigo em Inglês | MEDLINE | ID: mdl-30501646

RESUMO

This study analyses the extent to which cheating occurs in a real selection setting. A two-stage, unproctored and proctored, test administration was considered. Test score inconsistencies were concluded by applying a verification test (Guo and Drasgow Z-test). An initial simulation study showed that the Z-test has adequate Type I error and power rates in the specific selection settings explored. A second study applied the Z-test statistic verification procedure to a sample of 954 employment candidates. Additional external evidence based on item time response to the verification items was gathered. The results revealed a good performance of the Z-test statistic and a relatively low, but non-negligible, number of suspected cheaters that showed higher distorted ability estimates. The study with real data provided additional information on the presence of suspected cheating in unproctored applications and the viability of using item response times as an additional evidence of cheating. In the verification test, suspected cheaters spent 5.78 seconds per item more than expected considering the item difficulty and their assumed ability in the unproctored stage. We found that the percentage of suspected cheaters in the empirical study could be estimated at 13.84%. In summary, the study provides evidence of the usefulness of the Z-test in the detection of cheating in a specific setting, in which a computerized adaptive test for assessing English grammar knowledge was used for personnel selection.


Assuntos
Enganação , Avaliação Educacional/normas , Internet , Seleção de Pessoal/normas , Adulto , Feminino , Humanos , Masculino
12.
Front Psychol ; 9: 2540, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-30618961

RESUMO

This paper presents a new two-dimensional Multiple-Choice Model accounting for Omissions (MCMO). Based on Thissen and Steinberg multiple-choice models, the MCMO defines omitted responses as the result of the respondent not knowing the correct answer and deciding to omit rather than to guess given a latent propensity to omit. Firstly, using a Monte Carlo simulation, the accuracy of the parameters estimated from data with different sample sizes (500, 1,000, and 2,000 subjects), test lengths (20, 40, and 80 items) and percentages of omissions (5, 10, and 15%) were investigated. Later, the appropriateness of the MCMO to the Trends in International Mathematics and Science Study (TIMSS) Advanced 2015 mathematics and physics multiple-choice items was analyzed and compared with the Holman and Glas' Between-item Multi-dimensional IRT model (B-MIRT) and with the three-parameter logistic (3PL) model with omissions treated as incorrect responses. The results of the simulation study showed a good recovery of scale and position parameters. Pseudo-guessing parameters (d) were less accurate, but this inaccuracy did not seem to have an important effect on the estimation of abilities. The precision of the propensity to omit strongly depended on the ability values (the higher the ability, the worse the estimate of the propensity to omit). In the empirical study, the empirical reliability for ability estimates was high in both physics and mathematics. As in the simulation study, the estimates of the propensity to omit were less reliable and their precision varied with ability. Regarding the absolute item fit, the MCMO fitted the data better than the other models. Also, the MCMO offered significant increments in convergent validity between scores from multiple-choice and constructed-response items, with an increase of around 0.02 to 0.04 in R 2 in comparison with the two other methods. Finally, the high correlation between the country means of the propensity to omit in mathematics and physics suggests that (1) the propensity to omit is somehow affected by the country of residence of the examinees, and (2) the propensity to omit is independent of the test contents.

13.
Span. j. psychol ; 21: e62.1-e62.10, 2018. tab, graf
Artigo em Inglês | IBECS | ID: ibc-189177

RESUMO

This study analyses the extent to which cheating occurs in a real selection setting. A two-stage, unproctored and proctored, test administration was considered. Test score inconsistencies were concluded by applying a verification test (Guo and Drasgow Z-test). An initial simulation study showed that the Z-test has adequate Type I error and power rates in the specific selection settings explored. A second study applied the Z-test statistic verification procedure to a sample of 954 employment candidates. Additional external evidence based on item time response to the verification items was gathered. The results revealed a good performance of the Z-test statistic and a relatively low, but non-negligible, number of suspected cheaters that showed higher distorted ability estimates. The study with real data provided additional information on the presence of suspected cheating in unproctored applications and the viability of using item response times as an additional evidence of cheating. In the verification test, suspected cheaters spent 5.78 seconds per item more than expected considering the item difficulty and their assumed ability in the unproctored stage. We found that the percentage of suspected cheaters in the empirical study could be estimated at 13.84%. In summary, the study provides evidence of the usefulness of the Z-test in the detection of cheating in a specific setting, in which a computerized adaptive test for assessing English grammar knowledge was used for personnel selection


No disponible


Assuntos
Humanos , Masculino , Feminino , Adulto , Enganação , Avaliação Educacional/normas , Internet , Seleção de Pessoal/normas
14.
Psychol Methods ; 21(1): 93-111, 2016 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-26651983

RESUMO

An early step in the process of construct validation consists of establishing the fit of an unrestricted "exploratory" factorial model for a prespecified number of common factors. For this initial unrestricted model, researchers have often recommended and used fit indices to estimate the number of factors to retain. Despite the logical appeal of this approach, little is known about the actual accuracy of fit indices in the estimation of data dimensionality. The present study aimed to reduce this gap by systematically evaluating the performance of 4 commonly used fit indices-the comparative fit index (CFI), the Tucker-Lewis index (TLI), the root mean square error of approximation (RMSEA), and the standardized root mean square residual (SRMR)-in the estimation of the number of factors with categorical variables, and comparing it with what is arguably the current golden rule, Horn's (1965) parallel analysis. The results indicate that the CFI and TLI provide nearly identical estimations and are the most accurate fit indices, followed at a step below by the RMSEA, and then by the SRMR, which gives notably poor dimensionality estimates. Difficulties in establishing optimal cutoff values for the fit indices and the general superiority of parallel analysis, however, suggest that applied researchers are better served by complementing their theoretical considerations regarding dimensionality with the estimates provided by the latter method.


Assuntos
Interpretação Estatística de Dados , Modelos Estatísticos , Método de Monte Carlo , Humanos
15.
Span. j. psychol ; 17: e48.1-e48.9, ene.-dic. 2014. ilus
Artigo em Inglês | IBECS | ID: ibc-130460

RESUMO

Test security can be a major problem in computerized adaptive testing, as examinees can share information about the items they receive. Of the different item selection rules proposed to alleviate this risk, stratified methods are among those that have received most attention. In these methods, only low discriminative items can be presented at the beginning of the test and the mean information of the items increases as the test goes on. To do so, the item bank must be divided into several strata according to the information of the items. To date, there is no clear guidance about the optimal number of strata into which the item bank should be split. In this study, we will simulate conditions with different numbers of strata, from 1 (no stratification) to a number of strata equal to test length (maximum level of stratification) while manipulating the maximum exposure rate that no item should surpass (rmax) in its whole domain. In this way, we can plot the relation between test security and accuracy, making it possible to determine the number of strata that leads to better security while holding constant measurement accuracy. Our data indicates that the best option is to stratify into as many strata as possible (AU)


No disponible


Assuntos
Humanos , Masculino , Feminino , Testes Psicológicos/estatística & dados numéricos , Testes Psicológicos/normas , Discriminação Psicológica/fisiologia , Psicometria/métodos , Psicometria/normas , Psicometria/tendências , Segurança Computacional/tendências , Psicometria/organização & administração , Psicometria/estatística & dados numéricos
16.
Psicothema (Oviedo) ; 26(3): 395-400, ago. 2014. tab
Artigo em Inglês | IBECS | ID: ibc-130059

RESUMO

BACKGROUND: The Exploratory Factor Analysis (EFA) procedure is one of the most commonly used in social and behavioral sciences. However, it is also one of the most criticized due to the poor management researchers usually display. The main goal is to examine the relationship between practices usually considered more appropriate and actual decisions made by researchers. METHOD: The use of exploratory factor analysis is examined in 117 papers published between 2011 and 2012 in 3 Spanish psychological journals with the highest impact within the previous five years. RESULTS: Results show significant rates of questionable decisions in conducting EFA, based on unjustified or mistaken decisions regarding the method of extraction, retention, and rotation of factors. CONCLUSIONS: Overall, the current review provides support for some improvement guidelines regarding how to apply and report an EFA


ANTECEDENTES: la técnica del Análisis Factorial Exploratorio (AFE) es una de las más utilizadas en el ámbito de las Ciencias Sociales y del Comportamiento; no obstante, también es una de las técnicas más criticadas por la escasa solvencia con que se emplea en investigación aplicada. El objetivo principal de este artículo es describir y valorar el grado de correspondencia entre la aplicación del AFE en las publicaciones revisadas y las prácticas que habitualmente se consideran más adecuadas. MÉTODO: se analizan 117 estudios en los que se aplica la técnica del AFE, publicados en 2011 y 2012, en las tres revistas españolas de Psicología con mayor índice de impacto medio en los últimos cinco años. RESULTADOS: se obtienen importantes tasas de decisiones injustificadas o erróneas respecto al método de extracción, retención y rotación de factores. CONCLUSIONES: en conjunto, la presente revisión proporciona una guía sobre posibles mejoras al ejecutar e informar de un AFE


Assuntos
Humanos , Psicometria/instrumentação , Análise Fatorial , Estudos de Validação como Assunto , Escalas de Graduação Psiquiátrica
17.
Psicothema ; 26(3): 395-400, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-25069561

RESUMO

BACKGROUND: The Exploratory Factor Analysis (EFA) procedure is one of the most commonly used in social and behavioral sciences. However, it is also one of the most criticized due to the poor management researchers usually display. The main goal is to examine the relationship between practices usually considered more appropriate and actual decisions made by researchers. METHOD: The use of exploratory factor analysis is examined in 117 papers published between 2011 and 2012 in 3 Spanish psychological journals with the highest impact within the previous five years. RESULTS: RESULTS show significant rates of questionable decisions in conducting EFA, based on unjustified or mistaken decisions regarding the method of extraction, retention, and rotation of factors. CONCLUSIONS: Overall, the current review provides support for some improvement guidelines regarding how to apply and report an EFA.


Assuntos
Análise Fatorial , Estudos de Validação como Assunto , Guias como Assunto
18.
Span J Psychol ; 17: E48, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-25012203

RESUMO

Test security can be a major problem in computerized adaptive testing, as examinees can share information about the items they receive. Of the different item selection rules proposed to alleviate this risk, stratified methods are among those that have received most attention. In these methods, only low discriminative items can be presented at the beginning of the test and the mean information of the items increases as the test goes on. To do so, the item bank must be divided into several strata according to the information of the items. To date, there is no clear guidance about the optimal number of strata into which the item bank should be split. In this study, we will simulate conditions with different numbers of strata, from 1 (no stratification) to a number of strata equal to test length (maximum level of stratification) while manipulating the maximum exposure rate that no item should surpass (r max ) in its whole domain. In this way, we can plot the relation between test security and accuracy, making it possible to determine the number of strata that leads to better security while holding constant measurement accuracy. Our data indicates that the best option is to stratify into as many strata as possible.


Assuntos
Metodologias Computacionais , Avaliação Educacional/normas , Psicometria/normas , Avaliação Educacional/métodos , Humanos , Psicometria/métodos
19.
Psicothema (Oviedo) ; 25(2): 238-244, abr.-jun. 2013.
Artigo em Inglês | IBECS | ID: ibc-112236

RESUMO

Background: Criterion-referenced interpretations of tests are highly necessary, which usually involves the difficult task of establishing cut scores. Contrasting with other Item Response Theory (IRT)-based standard setting methods, a non-judgmental approach is proposed in this study, in which Item Characteristic Curve (ICC) transformations lead to the final cut scores. Method: eCat-Listening, a computerized adaptive test for the evaluation of English Listening, was administered to 1,576 participants, and the proposed standard setting method was applied to classify them into the performance standards of the Common European Framework of Reference for Languages (CEFR). Results: The results showed a classification closely related to relevant external measures of the English language domain, according to the CEFR. Conclusions: It is concluded that the proposed method is a practical and valid standard setting alternative for IRT-based tests interpretations (AU)


Antecedentes: las interpretaciones de los tests referidas a criterio son muy necesarias, lo cual normalmente implica la difícil tarea de establecer puntos de corte. En contraste con otros métodos de standard setting basados en la Teoría de la Respuesta al Ítem (TRI), en este estudio se propone una aproximación no basada en juicios, en que transformaciones de las Curvas Características de los Ítems (CCIs) dan lugar a los puntos de corte finales. Método: se administró eCat-Listening, un test adaptativo informatizado de evaluación de la comprensión oral del inglés, a 1.576 participantes y se aplicó el método de standard setting propuesto para clasificarles en los estándares de ejecución del Marco Común Europeo de Referencia para las lenguas (MCER). Resultados: los resultados mostraron una clasificación estrechamente relacionada con variables externas relevantes sobre dominio del inglés, de acuerdo con el MCER. Conclusiones: se concluye que el método de standard setting propuesto es una alternativa práctica y válida para las interpretaciones de tests basados en TRI (AU)


Assuntos
Humanos , Masculino , Feminino , Testes Psicológicos/estatística & dados numéricos , Testes Psicológicos/normas , Pesquisa/métodos , Interpretação Psicanalítica , Psicometria/organização & administração , Psicometria/normas , Serviços de Informação/tendências , Serviços de Informação , Modelos Teóricos/métodos , Psicometria/métodos , Psicometria/estatística & dados numéricos
20.
Psicothema ; 25(2): 238-44, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-23628540

RESUMO

BACKGROUND: Criterion-referenced interpretations of tests are highly necessary, which usually involves the difficult task of establishing cut scores. Contrasting with other Item Response Theory (IRT)-based standard setting methods, a non-judgmental approach is proposed in this study, in which Item Characteristic Curve (ICC) transformations lead to the final cut scores. METHOD: eCat-Listening, a computerized adaptive test for the evaluation of English Listening, was administered to 1,576 participants, and the proposed standard setting method was applied to classify them into the performance standards of the Common European Framework of Reference for Languages (CEFR). RESULTS: The results showed a classification closely related to relevant external measures of the English language domain, according to the CEFR. CONCLUSIONS: It is concluded that the proposed method is a practical and valid standard setting alternative for IRT-based tests interpretations.


Assuntos
Compreensão , Testes Psicológicos , Computadores , Humanos , Modelos Estatísticos , Psicometria , Reprodutibilidade dos Testes
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...